Deep Spatial Domain Generalization
Dazhou Yu, Guangji Bai, Yun Li, Liang Zhao
Department of Computer Science
Emory University
Atlanta, USA
{Dazhou.Yu, Guangji.Bai, Yun.Li, Liang.Zhao}@emory.edu
Abstract—Spatial autocorrelation and spatial heterogeneity
widely exist in spatial data, which make the traditional machine
learning model perform badly. Spatial domain generalization
is a spatial extension of domain generalization, which can
generalize to unseen spatial domains in continuous 2D space.
Speciﬁcally, it learns a model under varying data distributions
that generalizes to unseen domains. Although tremendous success
has been achieved in domain generalization, there exist very
few works on spatial domain generalization. The advancement
of this area is challenged by: 1) Difﬁculty in characterizing
spatial heterogeneity, and 2) Difﬁculty in obtaining predictive
models for unseen locations without training data. To address
these challenges, this paper proposes a generic framework for
spatial domain generalization. Speciﬁcally, We develop the spatial
interpolation graph neural network 1 that handles spatial data as
a graph and learns the spatial embedding on each node and their
relationships. The spatial interpolation graph neural network
infers the spatial embedding of an unseen location during the test
phase. Then the spatial embedding of the target location is used
to decode the parameters of the downstream-task model directly
on the target location. Finally, extensive experiments on ten real-
world datasets demonstrate the proposed method’s strength.
Index Terms—unseen domain generalization, spatial, GNN,
edge embedding, interpolation
I. INTRODUCTION
Traditional machine learning models are typically under
the independent and identically distributed (i.i.d.) assumption,
meaning the data samples are independent of each other
and follow the same distribution. However, this assumption
generally cannot be held for spatial data which have spatial
autocorrelation and heterogeneity. Spatial autocorrelation makes
the spatial location of a sample and corresponding spatial at-
tributes informative and samples not independent and identically
distributed (non-i.i.d.). Spatial heterogeneity includes spatial
non-stationarity and spatial anisotropy. Spatial non-stationarity
means that sample distribution varies across locations. Spatial
anisotropy means that the spatial dependency between sample
locations is non-uniform along different locations. Speciﬁcally,
the air pollution concentration of a location is usually a
complex function of various independent variables but the
relative importance of the independent variables are changing
with locations, e.g., the population density and distances from
emissions sources play an essential role in PM2.5 pollution
concentration in Urban built-up areas. But in rural areas, the
relative humidity is greatly attributed to the diffusion of PM2.5.
1https://github.com/dyu62/Deep-domain-generalization
This requires us to have some customization on different models
in different locations. However, in the training set, we usually
only have observations from a limited number of locations.
Hence, it is prevalent that we need to execute prediction tasks
in locations unseen in the training set. This results in a very
challenging task where we need to predict the model in a
new location without any training data. This paper focuses on
this new problem which we call spatial domain generalization,
which is a spatial extension of domain generalization [1].
Domain generalization learns a model under varying data
distributions that generalizes to unseen domains. It is derived
from and goes beyond domain adaptation, which builds the
bridge between source and target domains by characterizing
the transformation between the data from these domains
[2]. Current domain generalization only covers domains with
categorical indices [1] or time sequential domains [3] but
has not covered spatial domains which require considering
unique problems such as spatial autocorrelation and spatial
heterogeneity. Another thread of research comes from the
spatial data mining area, where people propose techniques such
as Geographically weighted regression (GWR) [4] to handle
spatial heterogeneity. Most of the time, prescribed models are
used where the underlying spatial distribution and correlation
need to be presumed and predeﬁned by the model designer
which may not reﬂect the true spatial process that is usually
complex and unknown. Especially, these models only consider
distances and ignore other spatial information such as direction.
What’s more, these models share the feature extractor on all
locations and only generate different coefﬁcients in the last
layer so they cannot capture complex heterogeneity within
data.
The spatial domain generalization is challenged by several
critical bottlenecks, including 1) Difﬁculty in characterizing
spatial heterogeneity. The data distribution is not identical
in the entire space and is changing with respect to locations’
confounding and characteristics. A simple global model cannot
explain the relationships between variables. So the nature
of the model must alter over space to reﬂect the structure
within the data. Modeling the spatially changing relationships
requires making the model location-sensitive. Feeding the
coordinate values as part of input features is intuitive. However,
such a method cannot leverage the fact of the other features’
dependency on location and other confounding factors varying
among locations. It is necessary yet difﬁcult to quantitatively
arXiv:2210.00729v2  [cs.LG]  28 Dec 2022

ﬁgure out how the spatial heterogeneity impacts the models
while there is no "one-ﬁts-all" rule for it. It is highly imperative
yet challenging to have some techniques that can automatically
learn from the data. 2) Difﬁculty in obtaining predictive
models for unseen locations without training data. Due to
the spatial heterogeneity, the local models in different locations
can be very different in order to capture the relationships
between predictors and the target variable. When training data
is not provided in some locations, the method must have the
capacity to generalize to these unseen locations. This is as
difﬁcult as zero-shot learning.
In order to address the above challenges, we propose a
generic framework for deep spatial domain generalization,
which generates the predictive models for any unseen spatial
domains. More speciﬁcally, to address the ﬁrst challenge, we
propose a novel spatial interpolation graph neural network
(SIGNN) to learn the spatial embedding of each location and
the relationships between them in the training set and infer the
spatial embedding of unseen locations during the test phase.
The spatial embedding of the target location is then used to
decode the parameterized model directly without training data
on the target location. This solves the second challenge. Our
contribution includes
• We propose a framework for spatial domain gen-
eralization. The framework doesn’t assume the data
distribution and learns the spatial embeddings for all the
locations in the training set in an end-to-end manner. It is
also compatible with general predictive task models such
as regression models and multi-layer perceptrons (MLP).
• We develop the spatial interpolation graph neural
network. It handles spatial data as a graph and uses the
edge representation to learn the spatial embedding on each
node and their relationships by doing graph convolution
operations. It also interpolates the spatial embedding at
any location so our method can generalize to unseen
locations.
• We conduct extensive experiments. We validated the
efﬁcacy of our method on ten real-world datasets for clas-
siﬁcation and regression tasks. Our method outperforms
state-of-the-art models on most of the tasks.
II. RELATED WORK
In this section, we summarize the works in the ﬁeld of
domain adaptation and domain generalization. Machine learning
systems often assume that training and test data follow the
same distribution, which, however, usually cannot be satisﬁed
in practice. Domain Adaptation (DA) aims to build the bridge
between source and target domains by characterizing the
transformation between the data from these domains
[2],
[5], [6]. Domain Adaptation (DA) has received great attention
from researchers in the past decade [2], [5], [6]. Under the
big umbrella of DA, continuous domain adaptation considers
the problem of adapting to target domains where the domain
index is a continuous variable (temporal DA is a special case
when the domain index is 1D). Approaches to tackling such
problems can be broadly classiﬁed into three categories: (1)
biasing the training loss towards future data via transportation
of past data [7], (2) using time-sensitive network parameters
and explicitly controlling their evolution along time [8], (3)
learning representations that are time-invariant using adversarial
methods [9]. The ﬁrst category augments the training data,
the second category reparameterizes the model, and the third
category redesigns the training objective. However, data may
not be available for the target domain, or it may not be possible
to adapt the base model, thus requiring Domain Generalization.
A diversity of DG methods have been proposed in recent
years. According to [10], existing DG methods can be cat-
egorized into the following three groups, namely: (1) Data
manipulation: This category of methods focuses on manipulat-
ing the inputs to assist in learning general representations. There
are two kinds of popular techniques along this line: a). Data
augmentation [11], which is mainly based on augmentation,
randomization, and transformation of input data; b). Data
generation [12], which generates diverse samples to help
generalization. (2) Representation learning: This category of
methods is the most popular in domain generalization. There are
two representative techniques: a). Domain-invariant representa-
tion learning [5], which performs kernel, adversarial training,
explicitly features alignment between domains, or invariant
risk minimization to learn domain-invariant representations;
b). Feature disentanglement [13], which tries to disentangle
the features into domain-shared or domain-speciﬁc parts for
better generalization. (3) Learning strategy: This category of
methods focuses on exploiting the general learning strategy to
promote the generalization capability.
III. METHODOLOGY
In this section, we ﬁrst provide the problem formulation and
the challenges of the problem, then we introduce our proposed
framework and how it solves the challenges.
A. Problem formulation
In this paper, we denote a geo-location by its 2D coordinate
values s ∈R2, and each s is associated with a spatial domain
(Xs × Ys), where we could have a set of samples (xs, ys) =
{(xi, yi) ∈(Xs × Ys)}Ns
i=1 where xi ∈X is i-th input sample
from the domain Xs, while yi ∈Y is the i-th output sample
from the domain Ys. For the classiﬁcation problem, yi can be
further narrowed to a binary value.
In opposition to an assumption that the relationship f remains
unchanged among dependent variables xi ∈Xs and indepen-
dent variables yi ∈Ys in the space R2, spatial heterogeneity
describes a condition in which the relationships between some
sets of variables {xi, yi} are heterogeneous throughout space,
i.e., fs ̸= fs′ if s ̸= s′. A static global model cannot capture
the changes in relationships, thus Domain Generalization (DG)
models which could reﬂect the heterogeneous relationships
within the data play a vital role in spatial analysis.
Our goal in this paper is to build a model that proactively
captures the data concept drift across different geo-locations.
Given a set of data samples {(xs, ys)}s∈S0 from seen domains,
where S0 denotes the set of seen locations, we aim to learn the

Fig. 1: Illustration of the proposed framework. The unseen location’s spatial embedding is interpolated by SIGNN. The edge
representation contains both the distance and direction information. The spatial embedding is decoded to the weights of the
downstream-task model.
predictive mapping functions fs : Xs →Ys for downstream
tasks such as classiﬁcation or regression on location s. Here the
location can either be seen (i.e., s ∈S0) or unseen (i.e., s ∈
(R2 −S0)). The former is spatial multitask learning while the
latter is spatial domain generalization. Therefore, our problem
is a generalization of both of them.
B. Proposed Method
1) Spatial domain generalization: We propose a bi-level
framework as shown by Fig. 1 which generates the predictive
models for any unseen spatial domains. Generally speaking,
we propose a novel spatial interpolation graph neural network
(SIGNN) to learn the target location’s spatial embedding. The
spatial embedding of the target location is then used to decode
the parameterized model directly without training data on
the target location. The general procedures of unseen domain
generalization and model training are outlined in the following
and detailed in Sections III-B2.
a) Spatial K-nearest neighbor graph: For any location
s we will ﬁrst build a spatial K-nearest neighbor graph upon
s and seen locations S0 that is deﬁned as G(s, S0; Z) =
(V (s, S0), E(s, S0); Z), where node set V (s, S0) = S0
S{s}
is just the union of the current location s and seen locations S0
deﬁned before. So in the case that s is a seen location, then V
is reduced to S0. E(s, S0) ⊆V × V denotes the relationships
among all the locations, which will be detailed in Section
III-B3. For simplicity, we omit the input and use V and E
directly in the following. Let N (K)
i
denote node vi’s K-nearest-
neighbors, the nodes whose Euclidean distance from vi is less
than or equal to the k-th largest Euclidean distance between
any node and vi. To be speciﬁc, for a node vj ∈N (K)
i
, a
directed edge (vj, vi) exists from vj to vi, so there are exactly
K nodes pointing to vi. Z = {zs}s∈S0 denotes the spatial
embeddings for all the locations except the current location
s, namely S0 −{s}, where zs is the spatial embedding vector
for location s. Here the spatial embeddings are also the node
features.
b) Unseen domain model generation: When doing spatial
domain generalization, we are interested in generating the
predictive model for an unseen location s′ ∈R2 −S0. And the
spatial embedding for location s′ is spatially interpolated by
our SIGNN via our newly proposed spatial interpolation graph
convolutions a(s′; E, Z) by referring to the spatial embeddings
of all seen spatial locations S0. Then the spatial embedding of
s′ is fed into the model generator to generate the parameterized
function fs′, namely the downstream task’s model with the
following function
fs′ = dϕ(gθ(s′; E, Z)),
(1)
where dϕ denotes the downstream-task-model generator pa-
rameterized by ϕ, gθ denotes SIGNN parameterized by θ. The
downstream task can be any classiﬁcation or regression task
on location s′ such as weather classiﬁcation, air pollution
prediction, and so forth. We will elaborate on the details of
transferring a’s output to a speciﬁc task’s model in Section
III-B2.
c) Model Training: The above model generation for
unseen location requires learning spatial embeddings Z =
{zs}s∈S0, model parameters θ of SIGNN gθ, and parameters
ϕ of model generator dϕ. In the following, we will introduce
how to jointly learn them in the training phase. For each seen
location, as mentioned in Section III-A, we know the input
and output data of the downstream task. Hence our training
objective is to maximize the likelihood given the prior of
p(Z), by learning the unknown spatial embedding and model
parameters,
arg max
Z,ϕ,θ{p(Y|X, ϕ, θ, Z)p(Z)},
(2)
the above equation is equal to minimizing the negative
logarithm of the likelihood as follows,
arg min
Z,ϕ,θ{−ln p(Y|X, ϕ, θ, Z) −ln p(Z)},
(3)
where Y and X denote the prediction and input for all
samples from all domains ({{(xi, yi) ∈(Xs × Ys)}Ns
i }s∈S0),
respectively. Since Z can be any continuous value, its prior
distribution p(Z) can be trivially assumed as an isotropic
Gaussian normal distribution, we have
arg min
Z,ϕ,θ{−ln p(Y|X, ϕ, θ, Z) + 1
2||Z||2}.
(4)

Hence the ﬁrst term is a downstream task-speciﬁc prediction
loss and the second term is a ℓ2 norm that regularizes Z.
So the ﬁrst term can also be more speciﬁcally expressed
as P
s∈S0 loss(fs(xs), ys), where the parameter Ws of each
location s’s downstream predictive function fs is calculated as
Ws = dϕ(zs) = dϕ(gθ(s; E, Z)).
(5)
In the following, we will more concretely introduce the pre-
diction and model parameter training of our overall framework.
Then in Section III-B2, we will detail our SIGNN model and
graph generator for generating the downstream-task model.
Lastly, in Section III-B3, we will drill down into our edge
representation.
2) Unseen domain model generator: In this subsection, we
ﬁrst introduce the details of our SIGNN model for doing the
spatial embedding interpolation for unseen locations and then
elaborate on the graph generator for generating the downstream
task model using the interpolated spatial embedding.
a) Spatial interpolation graph neural network (SIGNN):
As mentioned above, our SIGNN model gθ(s; E, Z) aims
at inferring the spatial embedding for a given location s,
based on other locations S0’s spatial embeddings and their
spatial correlation with s. A key challenge unique to spatial
interpolation beyond general message passing in graph neural
networks is how to comprehensively represent such correlation
among locations. Existing works that typically only consider
the distances among the locations to represent their correlation
cannot consider the integrated spatial information such as the
orientation of neighbors which are indispensable for spatial
interpolation.
To achieve this, in our SIGNN we propose a novel edge
representation E(s, S0) which is detailed in section III-B3 and
here we ﬁrst introduce SIGNN and its convolutional operations
based upon the edge representation and spatial embedding.
SIGNN is a stack of M spatial interpolation graph con-
volutional layers au, u = 1, 2, ..., U , namely gθ = aU ◦
aU−1 ◦. . . a1, where the input to each spatial interpolation
graph convolutional layer is the target location, the set of our
novel edge representations and spatial embeddings, namely
(s; E, Z). The spatial interpolation graph convolutional layer
interpolates the spatial embedding zs as its output while the
edge representations remain the same for each layer.
In order to do the interpolation, the spatial interpolation
graph convolutional layer au generates a pairwise weight ω(u)
ji
for each node vi and its neighbors vj ∈N (K)
i
, then the spatial
embedding of each node is updated by calculating a weighted
sum of the spatial embeddings of neighboring nodes, namely
z(u+1)
i
= PK
j=1 ω(u)
ji ∗z(u)
j
, where ω(u)
ji
equals
exp(σ(⃗αT [m1(eji)||m2(z(u)
i
)||m2(z(u)
j
)]))
P
k,vk∈N (K)
i
exp(σ(⃗αT [m1(eik)||m2(z(u)
i
)||m2(z(u)
k )]))
,
where eji ∈E denotes the edge representation for edge (vj, vi),
z(u)
i
and z(u)
j
denote the spatial embedding of node vi, vj at
layer au respectively, m1 and m2 denote two MLP models
that augment the spatial embedding and edge representation
respectively, || denotes the concatenation operation, σ denotes
the nonlinear activation function LeakyRuLU, ⃗α denotes a
vector parameter that transforms the concatenated vector to
a scalar. We also use the softmax function to normalize the
weights. Finally, we select the spatial embedding zs for location
s as the output.
b) Downstream-task model generator: Many shallow
models like linear regression, logistic regression, and support
vector machines manipulate the input vector with matrix
operations such as multiplication between input and weight
vectors. Such matrix operation can be considered as the fully-
connected layer or other types of layers, with or without a
nonlinear activation function. When the models go deep, then
multiple layers are stacked into deep neural networks. Hence
each of all these shallow or deep models for location s can
be denoted as its parameter which is network structured. Here
we can formally deﬁne such a network as G = (V, E; Ws),
where V are the neurons, E are the links between neurons,
and Ws are the link weights for the model at location s. Here
the model parameter Ws is namely the output of our model
generator dϕ : Ws = dϕ(zs). To be speciﬁc, a neural network
can be represented as an edge-weighted graph G, where each
node v ∈V corresponds to a neuron and each edge e ∈E
corresponds to the connection weight between two neurons.
Following works in [14], we use a three-layer MLP to generate
the downstream-task model’s weights Ws. Then the model can
load the weights and perform the task.
3) Edge representation for spatial interpolation: In this sec-
tion, we introduce edge representation for spatial interpolation
inspired by [15]. The proposed edge representation eji for an
edge (vj, vi), where vi is the target node and vj is the source
node, can be expressed as
eji = (lij, λijk) ,
(6)
where vk is the neighbor of vj that forms the smallest λijk ∈
[−π, π),
λijk = Parity · ¯λijk,
¯λijk = arccos(⟨sij
lij
, sjk
ljk
⟩),
lij = ∥sij∥2,
sij = ¯sj −¯si,
Parity = ⟨nijk, nxy⟩,
nijk =
sij × sjk
∥sij × sjk∥2
,
nxy = ux × uy,
(7)
where ¯sj and ¯sj denote the coordinate values of two locations,
ux and uy are unit vectors along the horizontal and vertical
axis of the coordinate system on the interested plane, nxy is
the normal of the interested plane, × denotes the cross product
operation.
IV. EXPERIMENT
In this section, we ﬁrst introduce the experimental settings,
then we compared the effectiveness of the proposed model

with comparison methods on ten real-world datasets. All
the experiments are conducted on a 64-bit machine with an
NVIDIA A5000 GPU.
A. Experiment setting
a) Dataset: We evaluate our method on ten real-world
datasets, including seven civil unrest event prediction datasets
and one inﬂuenza outbreak event prediction dataset extracted
from Twitter data for the classiﬁcation task, and two environ-
mental datasets collected by in-situ monitoring sensors and
satellites for the regression task.
• Civil unrest twitter datasets Seven civil unrest event
datasets from Brazil, Chile, Colombia, Ecuador, EI Sal-
vador, Uruguay, and Venezuela are utilized to evaluate
the performance of the proposed model. Details of these
datasets could be found in [16], [17].
• Inﬂuenza outbreak twitter dataset Flu activities are
collected from 48 states in the U.S. in this dataset. Details
of these datasets could be found in [18]–[20] We call this
dataset Flu in the following sections.
• PM2.5 concentration dataset PM2.5 data in the Los
Angeles region derived from the fusion of data collected by
PurpleAir sensors and the Moderate Resolution Imaging
Spectroradiometer (MODIS) TERRA and AQUA satellites
[21], as well as the meteorological dataset from MERRA-
2 reanalysis data [22]. The dataset contains latitude,
longitude, and meteorological values such as humidity,
surface pressure, wind speed, and the corresponding
ambient PM2.5 value in the location.
• Ambient temperature dataset In-situ air temperature
was downloaded from Weather Underground, a network of
weather stations. Satellite-based land surface temperature
(LST) products derived from MODIS satellite observations
and meteorological variables were collocated together to
estimate ambient temperature.
b) Comparison method: To the best of our knowledge,
there has been little work handling unseen spatial domains. The
following methods were included for comparing performances
on the collected datasets.
• ERM: A space-oblivious model which is trained on all
training domains using ERM.
• IncFinetune: In this model, we incrementally train a
global model on all training domains by ﬁnetuning the
model on the training domains one at a time.
• GTWNN [23]: A geographically weighted neural network
consisting of two artiﬁcial neural networks with the ﬁrst
network estimating the spatial weight of each independent
variable from coordinate values.
B. Experimental performance
We adopt Area under the ROC Curve (AUC) score and
mean absolute error (MAE) as the metrics for classiﬁcation
and regression tasks, respectively.
1) Effectiveness results: Table I summarizes the performance
comparison among the proposed methods and competing
models for civil unrest event forecasting, inﬂuenza outbreak
prediction, ambient PM2.5 concentration, and temperature
estimation tasks. The results show the proposed method
achieves the best performance on most datasets and has
comparable performance on other datasets. It indicates the
method that adapts to different locations can better model the
heterogeneous relationships among independent variables and
dependent along the changes of locations. For example, for
the seven civil unrest event dataset, the proposed model has
the highest AUC scores in most countries except Venezuela.
Speciﬁcally, the AUC scores of our model in Chile and Brazil
are much higher than that of baseline models.
2) Ablation study: We further conduct an ablation study
on all ten datasets to evaluate the effectiveness of different
components in our proposed model. Firstly, we remove the
interpolation function in SIGNN and train a single global spatial
embedding for all the locations and use this spatial embedding
to generate the weights of the downstream-task model. We
name this version of our method as SIGNN-G. The results of
this version are included in Table I.
As we can see, the interpolation function provided by SIGNN
contributes signiﬁcantly to the overall model performance. The
difference in performance between SIGNN-G is an indicator
of the extent of heterogeneity of the spatial data. This further
implies that spatial heterogeneity exists in almost all the
datasets except Columbia and Ecuador, on which the average
performances of SIGNN and SIGNN-G are the same.
V. CONCLUSION
Spatial autocorrelation and spatial heterogeneity widely
exist in spatial data, which makes the traditional machine
learning model perform badly. Spatial domain generalization
is a spatial extension of domain generalization, which can
generalize to unseen spatial domains in continuous 2D space.
Speciﬁcally, it learns a model under varying data distributions
that generalizes to unseen domains. Although tremendous
success has been achieved in domain generalization, there exist
very few works on spatial domain generalization. This paper
proposes a generic framework for spatial domain generalization.
Speciﬁcally, We develop a spatial interpolation graph neural
network that handles spatial data as a graph and learns the
spatial embedding on each node and their relationships. The
spatial interpolation graph neural network infers the spatial
embedding of an unseen location during the test phase. Then
the spatial embedding of the target location is used to decode
the parameters of the downstream-task model directly on the
target location. Extensive experiments on ten real-world datasets
demonstrate the proposed method’s strength. SIGNN achieves
the best performances on most of the datasets and comparable
performance on the others. The difference in the performances
on SIGNN-G and SIGNN validated our assumption that spatial
heterogeneity exists in most spatial datasets.

TABLE I: Comparison of our proposed method against existing methods on all ten datasets in terms of MAE for the ﬁrst two
datasets and AUC score for others. The standard deviation over three runs follows the ± mark. We observe that our proposed
method outperforms almost all the baselines
Dataset
ERM
IncFinetune
GTWNN
SIGNN-G
SIGNN
PM2.5
12.44 ± 4.64
13.73 ± 4.07
10.00 ± 0.58
9.66 ± 0.48
9.40 ± 0.46
Temperature
8.74 ± 1.23
11.13 ± 4.93
12.29 ± 7.81
7.41 ± 0.30
7.33 ± 0.28
Flu
0.84 ± 0.03
0.80 ± 0.03
0.75 ± 0.02
0.74 ± 0.05
0.84 ± 0.06
Brazil
0.53 ± 0.03
0.52 ± 0.03
0.59 ± 0.04
0.61 ± 0.08
0.65 ± 0.07
Chile
0.46 ± 0.04
0.44 ± 0.12
0.49 ± 0.07
0.57 ± 0.08
0.55 ± 0.05
Columbia
0.52 ± 0.08
0.44 ± 0.06
0.55 ± 0.07
0.56 ± 0.04
0.56 ± 0.11
Ecuador
0.47 ± 0.08
0.38 ± 0.13
0.47 ± 0.03
0.52 ± 0.08
0.52 ± 0.18
El salvador
0.50 ± 0.07
0.51 ± 0.08
0.46 ± 0.07
0.52 ± 0.07
0.53 ± 0.20
Uruguay
0.48 ± 0.08
0.50 ± 0.10
0.39 ± 0.12
0.40 ± 0.17
0.54 ± 0.01
Venezuela
0.51 ± 0.03
0.55 ± 0.04
0.56 ± 0.05
0.60 ± 0.03
0.54 ± 0.03
ACKNOWLEDGMENT
This work was supported by the National Science Founda-
tion(NSF) Grant No. 1755850, No. 1841520, No. 2007716,
No. 2007976, No. 1942594, No. 1907805, a Jeffress Memo-
rial Trust Award, Amazon Research Award, NVIDIA GPU
Grant, and Design Knowledge Company (subcontract number:
10827.002.120.04).
REFERENCES
[1] K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain generalization via
invariant feature representation,” in International Conference on Machine
Learning.
PMLR, 2013, pp. 10–18.
[2] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and
J. W. Vaughan, “A theory of learning from different domains,” Machine
learning, vol. 79, no. 1, pp. 151–175, 2010.
[3] A. Nasery, S. Thakur, V. Piratla, A. De, and S. Sarawagi, “Training for
the future: A simple gradient interpolation loss to generalize along time,”
Advances in Neural Information Processing Systems, vol. 34, 2021.
[4] D. C. Wheeler and A. Páez, “Geographically weighted regression,” in
Handbook of applied spatial analysis.
Springer, 2010, pp. 461–486.
[5] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette,
M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural
networks,” The journal of machine learning research, vol. 17, no. 1, pp.
2096–2030, 2016.
[6] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrimi-
native domain adaptation,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2017, pp. 7167–7176.
[7] J. Hoffman, T. Darrell, and K. Saenko, “Continuous manifold based
adaptation for evolving visual domains,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2014, pp.
867–874.
[8] M. Mancini, S. R. Bulo, B. Caputo, and E. Ricci, “Adagraph: Unifying
predictive and continuous domain adaptation through graphs,” in Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2019, pp. 6568–6577.
[9] H. Wang, H. He, and D. Katabi, “Continuously indexed domain
adaptation,” arXiv preprint arXiv:2007.01807, 2020.
[10] J. Wang, C. Lan, C. Liu, Y. Ouyang, W. Zeng, and T. Qin, “Generalizing
to unseen domains: A survey on domain generalization,” arXiv preprint
arXiv:2103.03097, 2021.
[11] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel,
“Domain randomization for transferring deep neural networks from
simulation to the real world,” in 2017 IEEE/RSJ international conference
on intelligent robots and systems (IROS).
IEEE, 2017, pp. 23–30.
[12] F. Qiao, L. Zhao, and X. Peng, “Learning to learn single domain
generalization,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2020, pp. 12 556–12 565.
[13] W. Li, Z. Xu, D. Xu, D. Dai, and L. Van Gool, “Domain generalization
and adaptation using low rank exemplar svms,” IEEE transactions on
pattern analysis and machine intelligence, vol. 40, no. 5, pp. 1114–1127,
2017.
[14] K.-H. N. Bui, J. Cho, and H. Yi, “Spatial-temporal graph neural network
for trafﬁc forecasting: An overview and open research issues,” Applied
Intelligence, pp. 1–12, 2021.
[15] Z. Zhang and L. Zhao, “Representation learning on spatial networks,”
Advances in Neural Information Processing Systems, vol. 34, pp. 2303–
2318, 2021.
[16] L. Zhao, Q. Sun, J. Ye, F. Chen, C.-T. Lu, and N. Ramakrishnan, “Multi-
task learning for spatio-temporal event forecasting,” in Proceedings of the
21th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, 2015, pp. 1503–1512.
[17] S. Muthiah, P. Butler, R. P. Khandpur, P. Saraf, N. Self, A. Rozovskaya,
L. Zhao, J. Cadena, C.-T. Lu, A. Vullikanti et al., “Embers at 4 years:
Experiences operating an open source indicators forecasting system,” in
Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2016, pp. 205–214.
[18] L. Zhao, Q. Sun, J. Ye, F. Chen, C.-T. Lu, and N. Ramakrishnan,
“Feature constrained multi-task learning models for spatiotemporal event
forecasting,” IEEE Transactions on Knowledge and Data Engineering,
vol. 29, no. 5, pp. 1059–1072, 2017.
[19] L. Zhao, J. Chen, F. Chen, W. Wang, C.-T. Lu, and N. Ramakrishnan,
“Simnest: Social media nested epidemic simulation via online semi-
supervised deep learning,” in 2015 IEEE international conference on
data mining.
IEEE, 2015, pp. 639–648.
[20] L. Zhao, “Event prediction in the big data era: A systematic survey,”
ACM Computing Surveys (CSUR), vol. 54, no. 5, pp. 1–37, 2021.
[21] R. Levy, S. Mattoo, L. Munchak, L. Remer, A. Sayer, F. Patadia, and
N. Hsu, “The collection 6 modis aerosol products over land and ocean,”
Atmospheric Measurement Techniques, vol. 6, no. 11, pp. 2989–3034,
2013.
[22] R. Gelaro, W. McCarty, M. J. Suárez, R. Todling, A. Molod, L. Takacs,
C. A. Randles, A. Darmenov, M. G. Bosilovich, R. Reichle et al., “The
modern-era retrospective analysis for research and applications, version
2 (merra-2),” Journal of climate, vol. 30, no. 14, pp. 5419–5454, 2017.
[23] L. Feng, Y. Wang, Z. Zhang, and Q. Du, “Geographically and temporally
weighted neural network for winter wheat yield prediction,” Remote
Sensing of Environment, vol. 262, p. 112514, 2021.
